Genome wide association studies (GWAS) can reveal important genotype to phenotype associations, however, data quality and interpretability issues must be addressed. The GWAX approach enables rational ranking, filtering and interpretation of GWAS via metrics, methods, and interactive visualization. Each inferred gene-to-trait association is evaluated for confidence and relevance, with scores solely derived from aggregated statistics, linking a protein-coding gene and phenotype. Applicability and thresholds will depend on use cases.
Issues/to-do:
reported or mapped? Current guess: reported.GWAS Catalog (http://www.ebi.ac.uk/gwas/) studies each have a study_accession. Also are associated with a publication (PubMedID), but not uniquely. See https://www.ebi.ac.uk/gwas/docs/fileheaders.
Some key definitions:
`REPORTED GENE(S)`*: Gene(s) reported by author
`MAPPED GENE(S)`: Gene(s) mapped to the strongest SNP. If the SNP is located
within a gene, that gene is listed. If the SNP is intergenic, the upstream
and downstream genes are listed, separated by a hyphen.
`OR or BETA`: Reported odds ratio or beta-coefficient associated with
strongest SNP risk allele. Note that if an OR <1 is reported this is
inverted, along with the reported allele, so that all ORs included in
the Catalog are >1. Appropriate unit and increase/decrease are included
for beta coefficients.
## [1] "Wed Jun 5 10:12:59 2019"
From GWAS Catalog, TCRD, and EFO.
## [1] "Studies total: 5774 ; accessions: 5774 ; traits: 3472 ; PMIDs: 3596"
### Laboratory platforms
Grouped by vendor (first in list if multiple), though technologies may have evolved for a given vendor.
| JOURNAL | N_gwas | N_pmid | N_assn |
|---|---|---|---|
| Nat Genet | 797 | 522 | 16177 |
| PLoS One | 383 | 226 | 4966 |
| PLoS Genet | 352 | 178 | 6419 |
| Hum Mol Genet | 352 | 243 | 4154 |
| Nat Commun | 307 | 128 | 7928 |
| Am J Hum Genet | 162 | 85 | 3238 |
| Sci Rep | 149 | 79 | 1003 |
| Mol Psychiatry | 138 | 98 | 1970 |
| Circ Cardiovasc Genet | 88 | 54 | 623 |
| Am J Med Genet B Neuropsychiatr Genet | 80 | 47 | 1070 |
| Diabetes | 72 | 39 | 463 |
| Hum Genet | 72 | 51 | 1693 |
iCite annotations from iCite API, with all PMIDs from GWASCatalog. New publications may lack iCite RCR. Should we impute RCR=median as reasonable prior?
## [1] "N_pmid = 5774"
## [2] "mean = 4.2 ; median = 2.0 ; max = 161.8"
## [3] "90%ile = 8.5"
## [4] "(Plot truncated at 25.)"
## [1] "Associations total: 87601 ; SNPs: 59580 ; traits: 2970 ; PMIDs: 3110"
GSYMB, MAPPED_GENE fields may include chromosomal locations or be “intergenic”.## [1] "snp2gene: total associations: 266000 ; studies: 4902 ; snps: 60099 ; genes: 22392 ; intergenic associations: 5145 ; chromosomal location associations: 38630"
| REPORTED_OR_MAPPED | N |
|---|---|
| reported | 116747 |
| mapped_upstream | 38759 |
| mapped_downstream | 38759 |
| mapped_within | 71735 |
## [1] "Studies: 4902"
## [1] "MAPPED_GENE values: 22750"
## [1] "REPORTED_GENE values: 18396"
## [1] "TCRD targets: 19947 ; geneSymbols: 19736"
## [1] "GSYMBs mapped to TCRD: 13725"
## [1] "Tbio: 7995" "Tchem: 1513" "Tclin: 512" "Tdark: 3831"
g2t should have one row for each gene-snp-study-trait association.
## [1] "GTs with pvalue_mlog, g2t: 242998 ; genes: 20681 ; traits: 1601"
## [1] "GTs with or_or_beta, g2t: 242998 ; genes: 20681 ; traits: 1601"
## [1] "GTs with oddsratio, g2t: 60065 ; genes: 12177 ; traits: 913"
## [1] "GTs with beta, g2t: 137631 ; genes: 12407 ; traits: 767"
EFO = Experimental Factor Ontology. Includes Orphanet, PO, Mondo and Uberon classes. TSV from source OWL.
## [1] "EFO total classes: 29085"
| Ontology | N_in_gwas | N_total |
|---|---|---|
| EFO | 1719 | 9642 |
| GO | 67 | 348 |
| HP | 46 | 474 |
| Orphanet | 40 | 5989 |
| CHEBI | 2 | 1317 |
## [1] "EFO classes: 29085 ; total subclass relationships: 48420"
## [1] "GWAS trait-subclass pairs: 1280"
| trait_id | trait_name | subclass_id | subclass_name | trait_N_gwas | subclass_N_gwas |
|---|---|---|---|---|---|
| EFO_0004340 | body mass index | EFO_0005937 | longitudinal BMI measurement | 96 | 12 |
| EFO_0004340 | body mass index | EFO_0005935 | overweight body mass index status | 96 | 3 |
| EFO_0004340 | body mass index | EFO_0005936 | underweight body mass index status | 96 | 2 |
| EFO_0004340 | body mass index | EFO_0005851 | height-adjusted body mass index | 96 | 1 |
| EFO_0004340 | body mass index | EFO_0007041 | obese body mass index status | 96 | 1 |
| EFO_0000692 | schizophrenia | EFO_0004609 | treatment refractory schizophrenia | 94 | 6 |
| EFO_0000305 | breast carcinoma | EFO_1000649 | estrogen-receptor positive breast cancer | 77 | 14 |
| EFO_0000305 | breast carcinoma | EFO_1000650 | estrogen-receptor negative breast cancer | 77 | 10 |
| EFO_0000305 | breast carcinoma | EFO_1002010 | TP53 Positive Breast Carcinoma | 77 | 1 |
| EFO_0000249 | Alzheimer’s disease | EFO_1001870 | late-onset Alzheimers disease | 74 | 2 |
| EFO_0004612 | high density lipoprotein cholesterol measurement | EFO_0007805 | HDL cholesterol change measurement | 70 | 3 |
| EFO_0000270 | asthma | EFO_0004591 | childhood onset asthma | 60 | 12 |
| EFO_0000270 | asthma | EFO_1002011 | adult onset asthma | 60 | 3 |
| EFO_0004530 | triglyceride measurement | EFO_0007681 | triglyceride change measurement | 59 | 5 |
| EFO_0005842 | colorectal cancer | EFO_1000657 | rectum cancer | 59 | 2 |
| EFO_0005842 | colorectal cancer | EFO_1001480 | metastatic colorectal cancer | 59 | 2 |
| EFO_0001663 | prostate carcinoma | EFO_0000196 | metastatic prostate cancer | 56 | 2 |
| EFO_0004611 | low density lipoprotein cholesterol measurement | EFO_0007804 | LDL cholesterol change measurement | 55 | 4 |
| EFO_0000685 | rheumatoid arthritis | EFO_0003898 | ankylosing spondylitis | 46 | 7 |
| EFO_0000685 | rheumatoid arthritis | EFO_0003778 | psoriatic arthritis | 46 | 4 |
| EFO_0003923 | bone density | EFO_0007701 | spine bone mineral density | 45 | 15 |
| EFO_0001645 | coronary heart disease | EFO_0000378 | coronary artery disease | 45 | 14 |
| EFO_0003923 | bone density | EFO_0007702 | hip bone mineral density | 45 | 9 |
| EFO_0003923 | bone density | EFO_0007933 | radius bone mineral density | 45 | 3 |
Read gt_stats.tsv, built by gwax_gt_stats.R for GWAX. Statistics designed to weigh evidence aggregated across studies, for each gene-trait association.
TO DO: Add EFO subclass based evidence aggregation scores.
## [1] "nrow(gt) = 33977"
| trait | trait_ids | N_genes |
|---|---|---|
| schizophrenia | EFO_0000692 | 1716 |
| intelligence | EFO_0004337 | 1232 |
| autism spectrum disorder | EFO_0003756 | 886 |
| prostate carcinoma | EFO_0001663 | 549 |
| inflammatory bowel disease | EFO_0003767 | 527 |
| systemic lupus erythematosus | EFO_0002690 | 507 |
| age at onset | EFO_0004847 | 481 |
| Crohn’s disease | EFO_0000384 | 468 |
| lung carcinoma | EFO_0001071 | 451 |
| Alzheimer’s disease | EFO_0000249 | 433 |
| bipolar disorder | EFO_0000289 | 422 |
| asthma | EFO_0000270 | 418 |
n_study)median(OR))n_traits_this_gene - Normally prefer low value, but depends on other traits, semantics/ontology.n_snp - How many SNPs? But is more or fewer better?pval_median - Interpretation may be a challenge.Color unmapped gray.
Plot for a selected trait:
## [1] "http://www.ebi.ac.uk/efo/EFO_0002508: Parkinson's disease"
Selects N non-dominated solutions on 2D multi-objective boundary.
Top hits:
| gsymb | name | fam | tdl | n_study | rcras | or_median | pvalue_mlog_median |
|---|---|---|---|---|---|---|---|
| C16orf75 | NA | NA | NA | 7 | 24.304 | 1.270 | 43.000 |
| MHC | NA | NA | NA | 6 | 17.928 | 1.240 | 16.398 |
| GAK | Cyclin-G-associated kinase | Kinase | Tchem | 5 | 19.635 | 1.260 | 50.000 |
| KIAA1267 | NA | NA | NA | 3 | 9.586 | 1.272 | 28.000 |
| CCHCR1 | Coiled-coil alpha-helical rod protein 1 | NA | Tbio | 3 | 9.508 | 1.390 | 6.000 |
| PRDM15 | PR domain zinc finger protein 15 | TF; Epigenetic | Tdark | 3 | 12.415 | 1.143 | 23.097 |
| ITGA8 | Integrin alpha-8 | NA | Tbio | 3 | 10.556 | 1.330 | 5.523 |
| GLTSCR1L | GLTSCR1-like protein | NA | Tdark | 3 | 10.059 | 1.190 | 7.523 |
| HLA-DRA | HLA class II histocompatibility antigen, DR alpha chain | NA | Tbio | 2 | 5.25 | 1.310 | 9.301 |
| PLEK | Pleckstrin | NA | Tbio | 2 | 4.247 | 1.350 | 5.699 |